Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 368757 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 34.1 MiB |
| Average record size in memory | 97.0 B |
Variable types
| Text | 5 |
|---|---|
| Numeric | 6 |
| Categorical | 1 |
| Boolean | 1 |
target is highly overall correlated with target_log | High correlation |
target_log is highly overall correlated with target | High correlation |
Pool is highly imbalanced (50.5%) | Imbalance |
sqft is highly skewed (γ1 = 116.8531877) | Skewed |
target is highly skewed (γ1 = 25.15742786) | Skewed |
dist_sch_min is highly skewed (γ1 = 197.6766553) | Skewed |
sqft has 10848 (2.9%) zeros | Zeros |
Reproduction
| Analysis started | 2024-06-02 06:23:39.670241 |
|---|---|
| Analysis finished | 2024-06-02 06:31:08.786484 |
| Duration | 7 minutes and 29.12 seconds |
| Software version | ydata-profiling v4.8.3 |
| Download configuration | config.json |
status
Text
| Distinct | 94 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.8 MiB |
Length
| Max length | 31 |
|---|---|
| Median length | 7 |
| Mean length | 7.3296995 |
| Min length | 1 |
Characters and Unicode
| Total characters | 2702878 |
|---|---|
| Distinct characters | 31 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 16 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | active |
|---|---|
| 2nd row | forsale |
| 3rd row | forsale |
| 4th row | forsale |
| 5th row | forsale |
| Value | Count | Frequency (%) |
| forsale | 197077 | |
| active | 103224 | |
| undefined | 38535 | 10.4% |
| foreclosure | 5991 | 1.6% |
| newconstruction | 5357 | 1.5% |
| pending | 4742 | 1.3% |
| pre-foreclosure | 2000 | 0.5% |
| undercontractshow | 1933 | 0.5% |
| p | 1484 | 0.4% |
| auction | 1292 | 0.4% |
| Other values (84) | 7122 | 1.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 419633 | |
| a | 309463 | |
| f | 246010 | |
| o | 238693 | |
| r | 233513 | |
| s | 215199 | |
| l | 206796 | |
| i | 159367 | 5.9% |
| c | 138266 | 5.1% |
| t | 128732 | 4.8% |
| Other values (21) | 407206 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2702878 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 419633 | |
| a | 309463 | |
| f | 246010 | |
| o | 238693 | |
| r | 233513 | |
| s | 215199 | |
| l | 206796 | |
| i | 159367 | 5.9% |
| c | 138266 | 5.1% |
| t | 128732 | 4.8% |
| Other values (21) | 407206 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2702878 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 419633 | |
| a | 309463 | |
| f | 246010 | |
| o | 238693 | |
| r | 233513 | |
| s | 215199 | |
| l | 206796 | |
| i | 159367 | 5.9% |
| c | 138266 | 5.1% |
| t | 128732 | 4.8% |
| Other values (21) | 407206 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2702878 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 419633 | |
| a | 309463 | |
| f | 246010 | |
| o | 238693 | |
| r | 233513 | |
| s | 215199 | |
| l | 206796 | |
| i | 159367 | 5.9% |
| c | 138266 | 5.1% |
| t | 128732 | 4.8% |
| Other values (21) | 407206 |
propertyType
Text
| Distinct | 164 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.8 MiB |
Length
| Max length | 46 |
|---|---|
| Median length | 13 |
| Mean length | 10.419843 |
| Min length | 3 |
Characters and Unicode
| Total characters | 3842390 |
|---|---|
| Distinct characters | 42 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 28 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | single family |
|---|---|
| 2nd row | single family |
| 3rd row | single family |
| 4th row | single family |
| 5th row | lot/land |
| Value | Count | Frequency (%) |
| family | 195849 | |
| single | 186831 | |
| condo | 41920 | 7.2% |
| unknown | 33517 | 5.8% |
| lot/land | 19560 | 3.4% |
| townhouse | 18077 | 3.1% |
| multi | 12103 | 2.1% |
| land | 9941 | 1.7% |
| condo/townhome | 8253 | 1.4% |
| traditional | 6045 | 1.0% |
| Other values (193) | 46918 | 8.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| l | 460715 | |
| i | 423188 | |
| n | 413556 | |
| a | 259300 | 6.7% |
| e | 246293 | 6.4% |
| o | 246094 | 6.4% |
| m | 230243 | 6.0% |
| s | 218275 | 5.7% |
| 210257 | 5.5% | |
| y | 202393 | 5.3% |
| Other values (32) | 932076 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 3842390 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| l | 460715 | |
| i | 423188 | |
| n | 413556 | |
| a | 259300 | 6.7% |
| e | 246293 | 6.4% |
| o | 246094 | 6.4% |
| m | 230243 | 6.0% |
| s | 218275 | 5.7% |
| 210257 | 5.5% | |
| y | 202393 | 5.3% |
| Other values (32) | 932076 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 3842390 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| l | 460715 | |
| i | 423188 | |
| n | 413556 | |
| a | 259300 | 6.7% |
| e | 246293 | 6.4% |
| o | 246094 | 6.4% |
| m | 230243 | 6.0% |
| s | 218275 | 5.7% |
| 210257 | 5.5% | |
| y | 202393 | 5.3% |
| Other values (32) | 932076 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 3842390 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| l | 460715 | |
| i | 423188 | |
| n | 413556 | |
| a | 259300 | 6.7% |
| e | 246293 | 6.4% |
| o | 246094 | 6.4% |
| m | 230243 | 6.0% |
| s | 218275 | 5.7% |
| 210257 | 5.5% | |
| y | 202393 | 5.3% |
| Other values (32) | 932076 |
baths
Text
| Distinct | 114 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.8 MiB |
Length
| Max length | 7 |
|---|---|
| Median length | 1 |
| Mean length | 3.192197 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1177145 |
|---|---|
| Distinct characters | 18 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 18 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 3.5 |
|---|---|
| 2nd row | 3 |
| 3rd row | 2 |
| 4th row | 8 |
| 5th row | unknown |
| Value | Count | Frequency (%) |
| unknown | 103909 | |
| 2 | 83965 | |
| 3 | 53303 | |
| 4 | 21143 | 5.7% |
| 2.0 | 16195 | 4.4% |
| 2.5 | 12599 | 3.4% |
| 3.0 | 10631 | 2.9% |
| 1 | 10406 | 2.8% |
| 5 | 7624 | 2.1% |
| 1.0 | 5759 | 1.6% |
| Other values (92) | 43223 |
Most occurring characters
| Value | Count | Frequency (%) |
| n | 311727 | |
| 2 | 120771 | 10.3% |
| u | 103909 | 8.8% |
| k | 103909 | 8.8% |
| o | 103909 | 8.8% |
| w | 103909 | 8.8% |
| 0 | 72533 | 6.2% |
| 3 | 71784 | 6.1% |
| . | 62807 | 5.3% |
| 5 | 41849 | 3.6% |
| Other values (8) | 80038 | 6.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1177145 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| n | 311727 | |
| 2 | 120771 | 10.3% |
| u | 103909 | 8.8% |
| k | 103909 | 8.8% |
| o | 103909 | 8.8% |
| w | 103909 | 8.8% |
| 0 | 72533 | 6.2% |
| 3 | 71784 | 6.1% |
| . | 62807 | 5.3% |
| 5 | 41849 | 3.6% |
| Other values (8) | 80038 | 6.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1177145 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| n | 311727 | |
| 2 | 120771 | 10.3% |
| u | 103909 | 8.8% |
| k | 103909 | 8.8% |
| o | 103909 | 8.8% |
| w | 103909 | 8.8% |
| 0 | 72533 | 6.2% |
| 3 | 71784 | 6.1% |
| . | 62807 | 5.3% |
| 5 | 41849 | 3.6% |
| Other values (8) | 80038 | 6.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1177145 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| n | 311727 | |
| 2 | 120771 | 10.3% |
| u | 103909 | 8.8% |
| k | 103909 | 8.8% |
| o | 103909 | 8.8% |
| w | 103909 | 8.8% |
| 0 | 72533 | 6.2% |
| 3 | 71784 | 6.1% |
| . | 62807 | 5.3% |
| 5 | 41849 | 3.6% |
| Other values (8) | 80038 | 6.8% |
city
Text
| Distinct | 1864 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.8 MiB |
Length
| Max length | 38 |
|---|---|
| Median length | 29 |
| Mean length | 9.0020474 |
| Min length | 1 |
Characters and Unicode
| Total characters | 3319568 |
|---|---|
| Distinct characters | 60 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 369 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | Southern Pines |
|---|---|
| 2nd row | Spokane Valley |
| 3rd row | Los Angeles |
| 4th row | Dallas |
| 5th row | Palm Bay |
| Value | Count | Frequency (%) |
| houston | 23907 | 4.8% |
| miami | 20472 | 4.1% |
| san | 18986 | 3.8% |
| antonio | 15229 | 3.1% |
| fort | 11307 | 2.3% |
| jacksonville | 10092 | 2.0% |
| charlotte | 9415 | 1.9% |
| beach | 8673 | 1.8% |
| dallas | 8517 | 1.7% |
| brooklyn | 7150 | 1.4% |
| Other values (1674) | 359772 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 346516 | 10.4% |
| o | 285836 | 8.6% |
| n | 256457 | 7.7% |
| e | 251948 | 7.6% |
| l | 223332 | 6.7% |
| i | 217604 | 6.6% |
| t | 198501 | 6.0% |
| s | 159868 | 4.8% |
| r | 157651 | 4.7% |
| 124803 | 3.8% | |
| Other values (50) | 1097052 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 3319568 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| a | 346516 | 10.4% |
| o | 285836 | 8.6% |
| n | 256457 | 7.7% |
| e | 251948 | 7.6% |
| l | 223332 | 6.7% |
| i | 217604 | 6.6% |
| t | 198501 | 6.0% |
| s | 159868 | 4.8% |
| r | 157651 | 4.7% |
| 124803 | 3.8% | |
| Other values (50) | 1097052 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 3319568 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| a | 346516 | 10.4% |
| o | 285836 | 8.6% |
| n | 256457 | 7.7% |
| e | 251948 | 7.6% |
| l | 223332 | 6.7% |
| i | 217604 | 6.6% |
| t | 198501 | 6.0% |
| s | 159868 | 4.8% |
| r | 157651 | 4.7% |
| 124803 | 3.8% | |
| Other values (50) | 1097052 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 3319568 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| a | 346516 | 10.4% |
| o | 285836 | 8.6% |
| n | 256457 | 7.7% |
| e | 251948 | 7.6% |
| l | 223332 | 6.7% |
| i | 217604 | 6.6% |
| t | 198501 | 6.0% |
| s | 159868 | 4.8% |
| r | 157651 | 4.7% |
| 124803 | 3.8% | |
| Other values (50) | 1097052 |
sqft
Real number (ℝ)
SKEWED  ZEROS 
| Distinct | 9877 |
|---|---|
| Distinct (%) | 2.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2583.1376 |
| Minimum | -1 |
|---|---|
| Maximum | 5728968 |
| Zeros | 10848 |
| Zeros (%) | 2.9% |
| Negative | 39006 |
| Negative (%) | 10.6% |
| Memory size | 2.8 MiB |
Quantile statistics
| Minimum | -1 |
|---|---|
| 5-th percentile | -1 |
| Q1 | 1049 |
| median | 1665 |
| Q3 | 2471 |
| 95-th percentile | 4546 |
| Maximum | 5728968 |
| Range | 5728969 |
| Interquartile range (IQR) | 1422 |
Descriptive statistics
| Standard deviation | 22781.321 |
|---|---|
| Coefficient of variation (CV) | 8.8192439 |
| Kurtosis | 21345.53 |
| Mean | 2583.1376 |
| Median Absolute Deviation (MAD) | 693 |
| Skewness | 116.85319 |
| Sum | 9.5255008 × 108 |
| Variance | 5.1898858 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| -1 | 39006 | 10.6% |
| 0 | 10848 | 2.9% |
| 1200 | 1403 | 0.4% |
| 1000 | 1009 | 0.3% |
| 1500 | 988 | 0.3% |
| 1800 | 967 | 0.3% |
| 1100 | 923 | 0.3% |
| 1400 | 889 | 0.2% |
| 2000 | 857 | 0.2% |
| 1600 | 822 | 0.2% |
| Other values (9867) | 311045 |
| Value | Count | Frequency (%) |
| -1 | 39006 | |
| 0 | 10848 | 2.9% |
| 1 | 76 | < 0.1% |
| 2 | 6 | < 0.1% |
| 3 | 2 | < 0.1% |
| 4 | 1 | < 0.1% |
| 5 | 2 | < 0.1% |
| 6 | 1 | < 0.1% |
| 10 | 2 | < 0.1% |
| 11 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 5728968 | 1 | |
| 4356000 | 2 | |
| 2807917 | 2 | |
| 2613600 | 1 | |
| 2585006 | 2 | |
| 1916640 | 1 | |
| 1761113 | 1 | |
| 1611720 | 1 | |
| 1598652 | 1 | |
| 1524600 | 1 |
zipcode
Real number (ℝ)
| Distinct | 4258 |
|---|---|
| Distinct (%) | 1.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 51492.224 |
| Minimum | 1103 |
|---|---|
| Maximum | 331446 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 2.8 MiB |
Quantile statistics
| Minimum | 1103 |
|---|---|
| 5-th percentile | 11234 |
| Q1 | 32833 |
| median | 37205 |
| Q3 | 77382 |
| 95-th percentile | 95403 |
| Maximum | 331446 |
| Range | 330343 |
| Interquartile range (IQR) | 44549 |
Descriptive statistics
| Standard deviation | 26863.544 |
|---|---|
| Coefficient of variation (CV) | 0.52170099 |
| Kurtosis | -1.30296 |
| Mean | 51492.224 |
| Median Absolute Deviation (MAD) | 17195 |
| Skewness | 0.29588501 |
| Sum | 1.8988118 × 1010 |
| Variance | 7.2165002 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 32137 | 2097 | 0.6% |
| 33131 | 1542 | 0.4% |
| 34747 | 1471 | 0.4% |
| 78245 | 1360 | 0.4% |
| 33137 | 1298 | 0.4% |
| 33132 | 1294 | 0.4% |
| 78253 | 1253 | 0.3% |
| 34759 | 1241 | 0.3% |
| 78254 | 1212 | 0.3% |
| 33130 | 1155 | 0.3% |
| Other values (4248) | 354834 |
| Value | Count | Frequency (%) |
| 1103 | 1 | < 0.1% |
| 1104 | 10 | |
| 1105 | 10 | |
| 1106 | 1 | < 0.1% |
| 1107 | 2 | < 0.1% |
| 1108 | 15 | |
| 1109 | 21 | |
| 1118 | 8 | < 0.1% |
| 1119 | 7 | < 0.1% |
| 1128 | 5 | < 0.1% |
| Value | Count | Frequency (%) |
| 331446 | 1 | < 0.1% |
| 123456 | 1 | < 0.1% |
| 112229 | 1 | < 0.1% |
| 99338 | 103 | |
| 99337 | 146 | |
| 99336 | 126 | |
| 99224 | 122 | |
| 99223 | 93 | |
| 99218 | 33 | < 0.1% |
| 99217 | 82 |
state
Categorical
| Distinct | 38 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.8 MiB |
| FL | |
|---|---|
| TX | |
| NY | |
| CA | |
| NC | |
| Other values (33) |
Length
| Max length | 2 |
|---|---|
| Median length | 2 |
| Mean length | 2 |
| Min length | 2 |
Characters and Unicode
| Total characters | 737514 |
|---|---|
| Distinct characters | 26 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 4 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | NC |
|---|---|
| 2nd row | WA |
| 3rd row | CA |
| 4th row | TX |
| 5th row | FL |
Common Values
| Value | Count | Frequency (%) |
| FL | 113015 | |
| TX | 81529 | |
| NY | 24002 | 6.5% |
| CA | 23094 | 6.3% |
| NC | 21388 | 5.8% |
| TN | 17673 | 4.8% |
| WA | 13563 | 3.7% |
| OH | 12282 | 3.3% |
| IL | 8799 | 2.4% |
| NV | 8335 | 2.3% |
| Other values (28) | 45077 | 12.2% |
Length
| Value | Count | Frequency (%) |
| fl | 113016 | |
| tx | 81529 | |
| ny | 24002 | 6.5% |
| ca | 23094 | 6.3% |
| nc | 21388 | 5.8% |
| tn | 17673 | 4.8% |
| wa | 13563 | 3.7% |
| oh | 12282 | 3.3% |
| il | 8799 | 2.4% |
| nv | 8335 | 2.3% |
| Other values (27) | 45076 | 12.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| L | 121815 | |
| F | 113016 | |
| T | 101348 | |
| X | 81529 | |
| N | 75047 | |
| C | 55340 | |
| A | 54449 | |
| Y | 24089 | 3.3% |
| O | 22186 | 3.0% |
| I | 17807 | 2.4% |
| Other values (16) | 70888 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 737514 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| L | 121815 | |
| F | 113016 | |
| T | 101348 | |
| X | 81529 | |
| N | 75047 | |
| C | 55340 | |
| A | 54449 | |
| Y | 24089 | 3.3% |
| O | 22186 | 3.0% |
| I | 17807 | 2.4% |
| Other values (16) | 70888 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 737514 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| L | 121815 | |
| F | 113016 | |
| T | 101348 | |
| X | 81529 | |
| N | 75047 | |
| C | 55340 | |
| A | 54449 | |
| Y | 24089 | 3.3% |
| O | 22186 | 3.0% |
| I | 17807 | 2.4% |
| Other values (16) | 70888 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 737514 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| L | 121815 | |
| F | 113016 | |
| T | 101348 | |
| X | 81529 | |
| N | 75047 | |
| C | 55340 | |
| A | 54449 | |
| Y | 24089 | 3.3% |
| O | 22186 | 3.0% |
| I | 17807 | 2.4% |
| Other values (16) | 70888 |
target
Real number (ℝ)
HIGH CORRELATION  SKEWED 
| Distinct | 34219 |
|---|---|
| Distinct (%) | 9.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 649131.57 |
| Minimum | 1 |
|---|---|
| Maximum | 1.95 × 108 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 2.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 39000 |
| Q1 | 189310 |
| median | 324750 |
| Q3 | 587500 |
| 95-th percentile | 1975000 |
| Maximum | 1.95 × 108 |
| Range | 1.95 × 108 |
| Interquartile range (IQR) | 398190 |
Descriptive statistics
| Standard deviation | 1848210.2 |
|---|---|
| Coefficient of variation (CV) | 2.8472044 |
| Kurtosis | 1346.3293 |
| Mean | 649131.57 |
| Median Absolute Deviation (MAD) | 169750 |
| Skewness | 25.157428 |
| Sum | 2.3937181 × 1011 |
| Variance | 3.4158811 × 1012 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 225000 | 1778 | 0.5% |
| 350000 | 1632 | 0.4% |
| 275000 | 1630 | 0.4% |
| 250000 | 1606 | 0.4% |
| 325000 | 1546 | 0.4% |
| 399000 | 1528 | 0.4% |
| 299900 | 1522 | 0.4% |
| 249900 | 1480 | 0.4% |
| 299000 | 1439 | 0.4% |
| 450000 | 1428 | 0.4% |
| Other values (34209) | 353168 |
| Value | Count | Frequency (%) |
| 1 | 15 | |
| 3 | 2 | < 0.1% |
| 8 | 1 | < 0.1% |
| 20 | 1 | < 0.1% |
| 25 | 1 | < 0.1% |
| 29 | 1 | < 0.1% |
| 30 | 1 | < 0.1% |
| 250 | 1 | < 0.1% |
| 393 | 1 | < 0.1% |
| 400 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 195000000 | 1 | |
| 165000000 | 2 | |
| 150000000 | 1 | |
| 129000000 | 1 | |
| 115000000 | 2 | |
| 110000000 | 2 | |
| 98000000 | 1 | |
| 88000000 | 1 | |
| 87000000 | 1 | |
| 85000000 | 1 |
Pool
Boolean
IMBALANCE 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 360.2 KiB |
| False | |
|---|---|
| True |
| Value | Count | Frequency (%) |
| False | 328840 | |
| True | 39917 | 10.8% |
Year built
Text
| Distinct | 221 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.8 MiB |
Length
| Max length | 4 |
|---|---|
| Median length | 4 |
| Mean length | 4 |
| Min length | 4 |
Characters and Unicode
| Total characters | 1475028 |
|---|---|
| Distinct characters | 14 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 14 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 2019 |
|---|---|
| 2nd row | 2019 |
| 3rd row | 1961 |
| 4th row | 2006 |
| 5th row | None |
| Value | Count | Frequency (%) |
| none | 59798 | 16.2% |
| 2019 | 30904 | 8.4% |
| 2006 | 7925 | 2.1% |
| 2005 | 7428 | 2.0% |
| 2007 | 7065 | 1.9% |
| 2018 | 6731 | 1.8% |
| 2004 | 5461 | 1.5% |
| 2017 | 5119 | 1.4% |
| 2016 | 5064 | 1.4% |
| 2008 | 4953 | 1.3% |
| Other values (211) | 228309 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 283437 | |
| 9 | 270242 | |
| 0 | 214884 | |
| 2 | 157027 | |
| 8 | 64099 | 4.3% |
| 5 | 62249 | 4.2% |
| N | 59798 | 4.1% |
| o | 59798 | 4.1% |
| n | 59798 | 4.1% |
| e | 59798 | 4.1% |
| Other values (4) | 183898 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1475028 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1 | 283437 | |
| 9 | 270242 | |
| 0 | 214884 | |
| 2 | 157027 | |
| 8 | 64099 | 4.3% |
| 5 | 62249 | 4.2% |
| N | 59798 | 4.1% |
| o | 59798 | 4.1% |
| n | 59798 | 4.1% |
| e | 59798 | 4.1% |
| Other values (4) | 183898 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1475028 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1 | 283437 | |
| 9 | 270242 | |
| 0 | 214884 | |
| 2 | 157027 | |
| 8 | 64099 | 4.3% |
| 5 | 62249 | 4.2% |
| N | 59798 | 4.1% |
| o | 59798 | 4.1% |
| n | 59798 | 4.1% |
| e | 59798 | 4.1% |
| Other values (4) | 183898 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1475028 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1 | 283437 | |
| 9 | 270242 | |
| 0 | 214884 | |
| 2 | 157027 | |
| 8 | 64099 | 4.3% |
| 5 | 62249 | 4.2% |
| N | 59798 | 4.1% |
| o | 59798 | 4.1% |
| n | 59798 | 4.1% |
| e | 59798 | 4.1% |
| Other values (4) | 183898 |
r_sch_mean
Real number (ℝ)
| Distinct | 79 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.8407444 |
| Minimum | -1 |
|---|---|
| Maximum | 9 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 5461 |
| Negative (%) | 1.5% |
| Memory size | 2.8 MiB |
Quantile statistics
| Minimum | -1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 3.3 |
| median | 4.8 |
| Q3 | 6.3 |
| 95-th percentile | 8 |
| Maximum | 9 |
| Range | 10 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 1.9693033 |
|---|---|
| Coefficient of variation (CV) | 0.40681827 |
| Kurtosis | -0.060039204 |
| Mean | 4.8407444 |
| Median Absolute Deviation (MAD) | 1.5 |
| Skewness | -0.11084441 |
| Sum | 1785058.4 |
| Variance | 3.8781554 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 5 | 23487 | 6.4% |
| 4 | 21855 | 5.9% |
| 6 | 20206 | 5.5% |
| 3 | 19323 | 5.2% |
| 3.3 | 14948 | 4.1% |
| 6.3 | 14861 | 4.0% |
| 4.7 | 14754 | 4.0% |
| 3.7 | 14133 | 3.8% |
| 7 | 13914 | 3.8% |
| 5.7 | 13723 | 3.7% |
| Other values (69) | 197553 |
| Value | Count | Frequency (%) |
| -1 | 5461 | |
| 1 | 2232 | |
| 1.2 | 18 | < 0.1% |
| 1.3 | 1067 | 0.3% |
| 1.4 | 12 | < 0.1% |
| 1.5 | 1948 | 0.5% |
| 1.6 | 72 | < 0.1% |
| 1.7 | 2565 | |
| 1.8 | 281 | 0.1% |
| 1.9 | 24 | < 0.1% |
| Value | Count | Frequency (%) |
| 9 | 7103 | |
| 8.8 | 242 | 0.1% |
| 8.7 | 2335 | 0.6% |
| 8.6 | 169 | < 0.1% |
| 8.5 | 2993 | 0.8% |
| 8.4 | 276 | 0.1% |
| 8.3 | 2809 | 0.8% |
| 8.2 | 1404 | 0.4% |
| 8 | 11435 | |
| 7.9 | 1 | < 0.1% |
dist_sch_min
Real number (ℝ)
SKEWED 
| Distinct | 1539 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.1919819 |
| Minimum | -1 |
|---|---|
| Maximum | 1590.38 |
| Zeros | 761 |
| Zeros (%) | 0.2% |
| Negative | 4256 |
| Negative (%) | 1.2% |
| Memory size | 2.8 MiB |
Quantile statistics
| Minimum | -1 |
|---|---|
| 5-th percentile | 0.1 |
| Q1 | 0.34 |
| median | 0.67 |
| Q3 | 1.3 |
| 95-th percentile | 3.62 |
| Maximum | 1590.38 |
| Range | 1591.38 |
| Interquartile range (IQR) | 0.96 |
Descriptive statistics
| Standard deviation | 5.4420641 |
|---|---|
| Coefficient of variation (CV) | 4.5655594 |
| Kurtosis | 49970.675 |
| Mean | 1.1919819 |
| Median Absolute Deviation (MAD) | 0.38 |
| Skewness | 197.67666 |
| Sum | 439551.67 |
| Variance | 29.616062 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.3 | 22015 | 6.0% |
| 0.4 | 20522 | 5.6% |
| 0.2 | 19862 | 5.4% |
| 0.5 | 18396 | 5.0% |
| 0.6 | 15751 | 4.3% |
| 0.7 | 13490 | 3.7% |
| 0.1 | 12336 | 3.3% |
| 0.8 | 10824 | 2.9% |
| 0.9 | 9415 | 2.6% |
| 1.1 | 7369 | 2.0% |
| Other values (1529) | 218777 |
| Value | Count | Frequency (%) |
| -1 | 4256 | |
| 0 | 761 | 0.2% |
| 0.01 | 1 | < 0.1% |
| 0.02 | 23 | < 0.1% |
| 0.03 | 119 | < 0.1% |
| 0.04 | 183 | < 0.1% |
| 0.05 | 342 | 0.1% |
| 0.06 | 405 | 0.1% |
| 0.07 | 501 | 0.1% |
| 0.08 | 543 | 0.1% |
| Value | Count | Frequency (%) |
| 1590.38 | 1 | |
| 1590.36 | 1 | |
| 1187.14 | 1 | |
| 725.21 | 1 | |
| 725.2 | 1 | |
| 725.19 | 2 | |
| 725.17 | 1 | |
| 460.86 | 1 | |
| 312.4 | 1 | |
| 117.8 | 1 |
target_log
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 34219 |
|---|---|
| Distinct (%) | 9.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12.653457 |
| Minimum | 0 |
|---|---|
| Maximum | 19.08851 |
| Zeros | 15 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 2.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 10.571317 |
| Q1 | 12.151141 |
| median | 12.690811 |
| Q3 | 13.283632 |
| 95-th percentile | 14.496079 |
| Maximum | 19.08851 |
| Range | 19.08851 |
| Interquartile range (IQR) | 1.1324904 |
Descriptive statistics
| Standard deviation | 1.1973912 |
|---|---|
| Coefficient of variation (CV) | 0.094629569 |
| Kurtosis | 3.6717131 |
| Mean | 12.653457 |
| Median Absolute Deviation (MAD) | 0.56324052 |
| Skewness | -0.6858503 |
| Sum | 4666051 |
| Variance | 1.4337457 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 12.32385568 | 1778 | 0.5% |
| 12.76568843 | 1632 | 0.4% |
| 12.52452638 | 1630 | 0.4% |
| 12.4292162 | 1606 | 0.4% |
| 12.69158046 | 1546 | 0.4% |
| 12.8967167 | 1528 | 0.4% |
| 12.61120436 | 1522 | 0.4% |
| 12.42881612 | 1480 | 0.4% |
| 12.60819885 | 1439 | 0.4% |
| 13.01700286 | 1428 | 0.4% |
| Other values (34209) | 353168 |
| Value | Count | Frequency (%) |
| 0 | 15 | |
| 1.098612289 | 2 | < 0.1% |
| 2.079441542 | 1 | < 0.1% |
| 2.995732274 | 1 | < 0.1% |
| 3.218875825 | 1 | < 0.1% |
| 3.36729583 | 1 | < 0.1% |
| 3.401197382 | 1 | < 0.1% |
| 5.521460918 | 1 | < 0.1% |
| 5.973809612 | 1 | < 0.1% |
| 5.991464547 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 19.08851012 | 1 | |
| 18.92145603 | 2 | |
| 18.82614585 | 1 | |
| 18.67532296 | 1 | |
| 18.56044269 | 2 | |
| 18.51599092 | 2 | |
| 18.40047804 | 1 | |
| 18.29284737 | 1 | |
| 18.28141868 | 1 | |
| 18.25816181 | 1 |
| Pool | dist_sch_min | r_sch_mean | sqft | state | target | target_log | zipcode | |
|---|---|---|---|---|---|---|---|---|
| Pool | 1.000 | 0.101 | 0.106 | 0.154 | 0.199 | 0.166 | 0.166 | -0.006 |
| dist_sch_min | 0.101 | 1.000 | 0.128 | 0.043 | 0.000 | -0.102 | -0.102 | -0.024 |
| r_sch_mean | 0.106 | 0.128 | 1.000 | 0.221 | 0.205 | 0.305 | 0.305 | 0.073 |
| sqft | 0.154 | 0.043 | 0.221 | 1.000 | 0.007 | 0.499 | 0.499 | 0.128 |
| state | 0.199 | 0.000 | 0.205 | 0.007 | 1.000 | -0.070 | -0.070 | 0.269 |
| target | 0.166 | -0.102 | 0.305 | 0.499 | -0.070 | 1.000 | 1.000 | 0.007 |
| target_log | 0.166 | -0.102 | 0.305 | 0.499 | -0.070 | 1.000 | 1.000 | 0.007 |
| zipcode | -0.006 | -0.024 | 0.073 | 0.128 | 0.269 | 0.007 | 0.007 | 1.000 |
| status | propertyType | baths | city | sqft | zipcode | state | target | Pool | Year built | r_sch_mean | dist_sch_min | target_log | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | active | single family | 3.5 | Southern Pines | 2900 | 28387 | NC | 418000 | False | 2019 | 5.2 | 2.70 | 12.943237 |
| 1 | forsale | single family | 3 | Spokane Valley | 1947 | 99216 | WA | 310000 | False | 2019 | 4.0 | 1.01 | 12.644328 |
| 2 | forsale | single family | 2 | Los Angeles | 3000 | 90049 | CA | 2895000 | True | 1961 | 6.7 | 1.19 | 14.878496 |
| 3 | forsale | single family | 8 | Dallas | 6457 | 75205 | TX | 2395000 | False | 2006 | 9.0 | 0.10 | 14.688894 |
| 4 | forsale | lot/land | unknown | Palm Bay | -1 | 32908 | FL | 5000 | False | None | 4.7 | 3.03 | 8.517193 |
| 5 | forsale | townhouse | unknown | Philadelphia | 897 | 19145 | PA | 209000 | False | 1920 | -1.0 | -1.00 | 12.250090 |
| 6 | active | florida | unknown | Poinciana | 1507 | 34759 | FL | 181500 | False | 2006 | 2.3 | 0.80 | 12.109011 |
| 7 | active | unknown | unknown | Memphis | -1 | 38115 | TN | 68000 | False | 1976 | 2.7 | 0.40 | 11.127263 |
| 8 | active | single family | 2 | Mason City | 3588 | 50401 | IA | 244900 | False | 1970 | 3.8 | 5.60 | 12.408605 |
| 9 | undefined | single family | 3 | Houston | 1930 | 77080 | TX | 311995 | False | 2019 | 3.0 | 0.60 | 12.650742 |
| status | propertyType | baths | city | sqft | zipcode | state | target | Pool | Year built | r_sch_mean | dist_sch_min | target_log | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 368747 | forsale | single family | 3 | Houston | 1792 | 77080 | TX | 280000 | False | 1970 | 2.7 | 0.19 | 12.542545 |
| 368748 | undefined | single family | 2.0 | Orlando | 1829 | 32805 | FL | 171306 | False | 1962 | 2.3 | 1.10 | 12.051207 |
| 368749 | active | single detached | unknown | Fort Worth | 1895 | 76110 | TX | 199900 | False | 1921 | 5.0 | 0.50 | 12.205573 |
| 368750 | undefined | single family | 2 | Houston | 1841 | 77089 | TX | 252990 | False | 2019 | 6.0 | 0.30 | 12.441105 |
| 368751 | forsale | condo | 3 | Washington | 1417 | 20001 | DC | 799000 | False | 2010 | 3.0 | 0.10 | 13.591116 |
| 368752 | undefined | single family | 6.0 | Miami | 4017 | 33180 | FL | 1249000 | True | 1990 | 5.0 | 1.10 | 14.037854 |
| 368753 | forsale | condo | 3 | Chicago | 2000 | 60657 | IL | 674999 | False | 1924 | 4.3 | 0.40 | 13.422466 |
| 368754 | forsale | single family | 3 | Jamaica | 1152 | 11434 | NY | 528000 | False | 1950 | 4.5 | 0.48 | 13.176852 |
| 368755 | undefined | unknown | unknown | Houston | -1 | 77028 | TX | 34500 | False | None | -1.0 | 0.50 | 10.448715 |
| 368756 | undefined | single family | 2.0 | San Antonio | 1462 | 78218 | TX | 204900 | False | 2019 | 4.0 | 0.30 | 12.230277 |